Bacteria
Neural Ordinary Differential Equations for Simulating Metabolic Pathway Dynamics from Time-Series Multiomics Data
Habaraduwa, Udesh, Lixandru, Andrei
The advancement of human healthspan and bioengineering relies heavily on predicting the behavior of complex biological systems. While high-throughput multiomics data is becoming increasingly abundant, converting this data into actionable predictive models remains a bottleneck. High-capacity, datadriven simulation systems are critical in this landscape; unlike classical mechanistic models restricted by prior knowledge, these architectures can infer latent interactions directly from observational data, allowing for the simulation of temporal trajectories and the anticipation of downstream intervention effects in personalized medicine and synthetic biology. To address this challenge, we introduce Neural Ordinary Differential Equations (NODEs) as a dynamic framework for learning the complex interplay between the proteome and metabolome. We applied this framework to time-series data derived from engineered Escherichia coli strains, modeling the continuous dynamics of metabolic pathways. The proposed NODE architecture demonstrates superior performance in capturing system dynamics compared to traditional machine learning pipelines. Our results show a greater than 90% improvement in root mean squared error over baselines across both Limonene (up to 94.38% improvement) and Isopentenol (up to 97.65% improvement) pathway datasets. Furthermore, the NODE models demonstrated a 1000x acceleration in inference time, establishing them as a scalable, high-fidelity tool for the next generation of metabolic engineering and biological discovery.
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.48)
- Water & Waste Management > Water Management > Constituents > Bacteria (0.34)
Path Signatures Enable Model-Free Mapping of RNA Modifications
Lemercier, Maud, Arrubarrena, Paola, Di Giorgio, Salvatore, Brettschneider, Julia, Cass, Thomas, Vries, Isabel S. Naarmann-de, Papavasiliou, Anastasia, Ruggieri, Alessia, Tellioglu, Irem, Wu, Chia Ching, Papavasiliou, F. Nina, Lyons, Terry
Detecting chemical modifications on RNA molecules remains a key challenge in epitranscriptomics. Traditional reverse transcription-based sequencing methods introduce enzyme- and sequence-dependent biases and fragment RNA molecules, confounding the accurate mapping of modifications across the transcriptome. Nanopore direct RNA sequencing offers a powerful alternative by preserving native RNA molecules, enabling the detection of modifications at single-molecule resolution. However, current computational tools can identify only a limited subset of modification types within well-characterized sequence contexts for which ample training data exists. Here, we introduce a model-free computational method that reframes modification detection as an anomaly detection problem, requiring only canonical (unmodified) RNA reads without any other annotated data. For each nanopore read, our approach extracts robust, modification-sensitive features from the raw ionic current signal at a site using the signature transform, then computes an anomaly score by comparing the resulting feature vector to its nearest neighbors in an unmodified reference dataset. We convert anomaly scores into statistical p-values to enable anomaly detection at both individual read and site levels. Validation on densely-modified \textit{E. coli} rRNA demonstrates that our approach detects known sites harboring diverse modification types, without prior training on these modifications. We further applyied this framework to dengue virus (DENV) transcripts and mammalian mRNAs. For DENV sfRNA, it led to revealing a novel 2'-O-methylated site, which we validate orthogonally by qRT-PCR assays. These results demonstrate that our model-free approach operates robustly across different types of RNAs and datasets generated with different nanopore sequencing chemistries.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Water & Waste Management > Water Management > Constituents > Bacteria (0.35)
- North America > United States > Wyoming (0.05)
- Europe > Denmark (0.05)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Government > Regional Government > North America Government > United States Government (0.48)
- Water & Waste Management > Water Management > Constituents > Bacteria (0.47)
SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models
Fan, Jigang, Zhou, Zhenghong, Jin, Ruofan, Cong, Le, Wang, Mengdi, Zhang, Zaixi
Proteins play crucial roles in almost all biological processes. The advancement of deep learning has greatly accelerated the development of protein foundation models, leading to significant successes in protein understanding and design. However, the lack of systematic red-teaming for these models has raised serious concerns about their potential misuse, such as generating proteins with biological safety risks. This paper introduces SafeProtein, the first red-teaming framework designed for protein foundation models to the best of our knowledge. SafeProtein combines multimodal prompt engineering and heuristic beam search to systematically design red-teaming methods and conduct tests on protein foundation models. We also curated SafeProtein-Bench, which includes a manually constructed red-teaming benchmark dataset and a comprehensive evaluation protocol. SafeProtein achieved continuous jailbreaks on state-of-the-art protein foundation models (up to 70% attack success rate for ESM3), revealing potential biological safety risks in current protein foundation models and providing insights for the development of robust security protection technologies for frontier models. The codes will be made publicly available at https://github.com/jigang-fan/SafeProtein.
- North America > United States (0.14)
- Asia > Vietnam > Bắc Kạn Province > Bắc Kạn (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Fusing Sequence Motifs and Pan-Genomic Features: Antimicrobial Resistance Prediction using an Explainable Lightweight 1D CNN-XGBoost Ensemble
Siddiqui, Md. Saiful Bari, Tarannum, Nowshin
Antimicrobial Resistance (AMR) is a rapidly escalating global health crisis. While genomic sequencing enables rapid prediction of resistance phenotypes, current computational methods have limitations. Standard machine learning models treat the genome as an unordered collection of features, ignoring the sequential context of Single Nucleotide Polymorphisms (SNPs). State-of-the-art sequence models like Transformers are often too data-hungry and computationally expensive for the moderately-sized datasets that are typical in this domain. To address these challenges, we propose AMR-EnsembleNet, an ensemble framework that synergistically combines sequence-based and feature-based learning. We developed a lightweight, custom 1D Convolutional Neural Network (CNN) to efficiently learn predictive sequence motifs from high-dimensional SNP data. This sequence-aware model was ensembled with an XGBoost model, a powerful gradient boosting system adept at capturing complex, non-local feature interactions. We trained and evaluated our framework on a benchmark dataset of 809 E. coli strains, predicting resistance across four antibiotics with varying class imbalance. Our 1D CNN-XGBoost ensemble consistently achieved top-tier performance across all the antibiotics, reaching a Matthews Correlation Coefficient (MCC) of 0.926 for Ciprofloxacin (CIP) and the highest Macro F1-score of 0.691 for the challenging Gentamicin (GEN) AMR prediction. We also show that our model consistently focuses on SNPs within well-known AMR genes like fusA and parC, confirming it learns the correct genetic signals for resistance. Our work demonstrates that fusing a sequence-aware 1D CNN with a feature-based XGBoost model creates a powerful ensemble, overcoming the limitations of using either an order-agnostic or a standalone sequence model.
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Arizona (0.04)
- Asia > Singapore (0.04)
- Research Report (0.64)
- Overview (0.46)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Water & Waste Management > Water Management > Constituents > Bacteria (0.49)
Predicting Antimicrobial Resistance (AMR) in Campylobacter, a Foodborne Pathogen, and Cost Burden Analysis Using Machine Learning
Mishra, Shubham, Han, The Anh, Lopes, Bruno Silvester, Ghareeb, Shatha, Shamszaman, Zia Ush
Antimicrobial resistance (AMR) poses a significant public health and economic challenge, increasing treatment costs and reducing antibiotic effectiveness. This study employs machine learning to analyze genomic and epidemiological data from the public databases for molecular typing and microbial genome diversity (PubMLST), incorporating data from UK government-supported AMR surveillance by the Food Standards Agency and Food Standards Scotland. We identify AMR patterns in Campylobacter jejuni and Campylobacter coli isolates collected in the UK from 2001 to 2017. The research integrates whole-genome sequencing (WGS) data, epidemiological metadata, and economic projections to identify key resistance determinants and forecast future resistance trends and healthcare costs. We investigate gyrA mutations for fluoroquinolone resistance and the tet(O) gene for tetracycline resistance, training a Random Forest model validated with bootstrap resampling (1,000 samples, 95% confidence intervals), achieving 74% accuracy in predicting AMR phenotypes. Time-series forecasting models (SARIMA, SIR, and Prophet) predict a rise in campylobacteriosis cases, potentially exceeding 130 cases per 100,000 people by 2050, with an economic burden projected to surpass 1.9 billion GBP annually if left unchecked. An enhanced Random Forest system, analyzing 6,683 isolates, refines predictions by incorporating temporal patterns, uncertainty estimation, and resistance trend modeling, indicating sustained high beta-lactam resistance, increasing fluoroquinolone resistance, and fluctuating tetracycline resistance.
- Europe > United Kingdom > Scotland (0.25)
- Europe > United Kingdom > England > North Yorkshire > Middlesbrough (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.89)
- Water & Waste Management > Water Management > Constituents > Bacteria (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.57)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.57)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties
Machine learning models for molecular property prediction generally rely on representations -- such as SMILES strings and molecular graphs -- that overlook the surface-local phenomena driving intermolecular behavior. 3D-based approaches often reduce surface detail or require computationally expensive SE(3)-equivariant architectures to manage spatial variance. To overcome these limitations, this work introduces AMPTCR (Aligned Manifold Property and Topology Cloud Representation), a molecular surface representation that combines local quantum-derived scalar fields and custom topological descriptors within an aligned point cloud format. Each surface point includes a chemically meaningful scalar, geodesically derived topology vectors, and coordinates transformed into a canonical reference frame, enabling efficient learning with conventional SE(3)-sensitive architectures. AMPTCR is evaluated using a DGCNN framework on two tasks: molecular weight and bacterial growth inhibition. For molecular weight, results confirm that AMPTCR encodes physically meaningful data, with a validation R^2 of 0.87. In the bacterial inhibition task, AMPTCR enables both classification and direct regression of E. coli inhibition values using Dual Fukui functions as the electronic descriptor and Morgan Fingerprints as auxiliary data, achieving an ROC AUC of 0.912 on the classification task, and an R^2 of 0.54 on the regression task. These results help demonstrate that AMPTCR offers a compact, expressive, and architecture-agnostic representation for modeling surface-mediated molecular properties.
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.48)
- Water & Waste Management > Water Management > Constituents > Bacteria (0.34)
Predicting and generating antibiotics against future pathogens with ApexOracle
Leng, Tianang, Wan, Fangping, Torres, Marcelo Der Torossian, de la Fuente-Nunez, Cesar
Antimicrobial resistance (AMR) is escalating and outpacing current antibiotic development. Thus, discovering antibiotics effective against emerging pathogens is becoming increasingly critical. However, existing approaches cannot rapidly identify effective molecules against novel pathogens or emerging drug-resistant strains. Here, we introduce ApexOracle, an artificial intelligence (AI) model that both predicts the antibacterial potency of existing compounds and designs de novo molecules active against strains it has never encountered. Departing from models that rely solely on molecular features, ApexOracle incorporates pathogen-specific context through the integration of molecular features captured via a foundational discrete diffusion language model and a dual-embedding framework that combines genomic- and literature-derived strain representations. Across diverse bacterial species and chemical modalities, ApexOracle consistently outperformed state-of-the-art approaches in activity prediction and demonstrated reliable transferability to novel pathogens with little or no antimicrobial data. Its unified representation-generation architecture further enables the in silico creation of "new-to-nature" molecules with high predicted efficacy against priority threats. By pairing rapid activity prediction with targeted molecular generation, ApexOracle offers a scalable strategy for countering AMR and preparing for future infectious-disease outbreaks.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- Africa > Niger (0.04)
- Research Report > Promising Solution (0.48)
- Overview > Innovation (0.34)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Water & Waste Management > Water Management > Constituents > Bacteria (0.71)
Foundation models may exhibit staged progression in novel CBRN threat disclosure
The extent to which foundation models can disclose novel chemical, biological, radiation, and nuclear (CBRN) threats to expert users is unclear due to a lack of test cases. I leveraged the unique opportunity presented by an upcoming publication describing a novel catastrophic biothreat - "Technical Report on Mirror Bacteria: Feasibility and Risks" - to conduct a small controlled study before it became public. Graduate-trained biologists tasked with predicting the consequences of releasing mirror E. coli showed no significant differences in rubric-graded accuracy using Claude Sonnet 3.5 new (n=10) or web search only (n=2); both groups scored comparably to a web baseline (28 and 43 versus 36). However, Sonnet reasoned correctly when prompted by a report author, but a smaller model, Haiku 3.5, failed even with author guidance (80 versus 5). These results suggest distinct stages of model capability: Haiku is unable to reason about mirror life even with threat-aware expert guidance (Stage 1), while Sonnet correctly reasons only with threat-aware prompting (Stage 2). Continued advances may allow future models to disclose novel CBRN threats to naive experts (Stage 3) or unskilled users (Stage 4). While mirror life represents only one case study, monitoring new models' ability to reason about privately known threats may allow protective measures to be implemented before widespread disclosure.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Developing cholera outbreak forecasting through qualitative dynamics: Insights into Malawi case study
Ghosh, Adrita, Das, Parthasakha, Chakraborty, Tanujit, Das, Pritha, Ghosh, Dibakar
Cholera, an acute diarrheal disease, is a serious concern in developing and underdeveloped areas. A qualitative understanding of cholera epidemics aims to foresee transmission patterns based on reported data and mechanistic models. The mechanistic model is a crucial tool for capturing the dynamics of disease transmission and population spread. However, using real-time cholera cases is essential for forecasting the transmission trend. This prospective study seeks to furnish insights into transmission trends through qualitative dynamics followed by machine learning-based forecasting. The Monte Carlo Markov Chain approach is employed to calibrate the proposed mechanistic model. We identify critical parameters that illustrate the disease's dynamics using partial rank correlation coefficient-based sensitivity analysis. The basic reproduction number as a crucial threshold measures asymptotic dynamics. Furthermore, forward bifurcation directs the stability of the infection state, and Hopf bifurcation suggests that trends in transmission may become unpredictable as societal disinfection rates rise. Further, we develop epidemic-informed machine learning models by incorporating mechanistic cholera dynamics into autoregressive integrated moving averages and autoregressive neural networks. We forecast short-term future cholera cases in Malawi by implementing the proposed epidemic-informed machine learning models to support this. We assert that integrating temporal dynamics into the machine learning models can enhance the capabilities of cholera forecasting models. The execution of this mechanism can significantly influence future trends in cholera transmission. This evolving approach can also be beneficial for policymakers to interpret and respond to potential disease systems. Moreover, our methodology is replicable and adaptable, encouraging future research on disease dynamics.
- Africa > Malawi (0.62)
- North America > Haiti (0.14)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.06)
- (11 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Water & Waste Management > Water Management > Constituents > Bacteria (0.48)